BUG: Don't error with empty Series for .isin #17006

gfyoung · 2017-07-18T08:18:18Z

Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership.

Closes #16991.

Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership. Closes pandas-devgh-16991.

codecov · 2017-07-18T08:45:46Z

Codecov Report

Merging #17006 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17006      +/-   ##
==========================================
+ Coverage   90.99%      91%   +<.01%     
==========================================
  Files         161      161              
  Lines       49291    49293       +2     
==========================================
+ Hits        44852    44857       +5     
+ Misses       4439     4436       -3

Flag	Coverage Δ
#multiple	`88.77% <100%> (+0.02%)`	⬆️
#single	`40.18% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.43% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.76% <0%> (-0.1%)`	⬇️
pandas/plotting/_converter.py	`65.05% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcb0263...c2cec77. Read the comment docs.

jreback · 2017-07-18T09:17:32Z

pandas/core/algorithms.py

@@ -65,6 +65,8 @@ def _ensure_data(values, dtype=None):

    # we check some simple dtypes first
    try:
+        if is_object_dtype(dtype):


check perf on some of the algos
esp isin

I didn't see any noticeable perf degradations on my machine.

did you run the asv? this can degrade as its in a critical path. notice there is already a check for object type later on. This function receives LOTS of input.

Not quite: the existing check is for the values. This is for the dtype specified. This is essentially an O(1) operation.

And yes, I did check performance (see my comment above), and I didn't see any issues on my machine.

jreback · 2017-07-19T00:38:21Z

ok merge away

FYI you would think the checks are O(1) but the current impl is not great and repeated calling or misordering of the checks can cause a lot of drag

gfyoung · 2017-07-19T00:44:52Z

FYI you would think the checks are O(1) but the current impl is not great and repeated calling or misordering of the checks can cause a lot of drag

Ah, that's fair. I will keep my eye out for this and will readdress if other people experience a significant degradation in someway that isn't addressed by the tests.

@TomAugspurger

* consolidated the duplicate definitions of NA values (in parsers & IO) (pandas-dev#16589) * GH15943 Fixed defaults for compression in HDF5 (pandas-dev#16355) * DOC: add header=None to read_excel docstring (pandas-dev#16689) * TST: Test against python-dateutil master (pandas-dev#16648) * BUG: .iloc[:] and .loc[:] return a copy of the original object pandas-dev#13873 (pandas-dev#16443) closes pandas-dev#13873 * TST: Add test of building frame from named Series and columns (pandas-dev#9232) (pandas-dev#16700) * DOC: fix wrongly placed versionadded (pandas-dev#16702) * DOC: pin sphinx to version 1.5 (pandas-dev#16704) * CI: restore np 113 in ci builds (pandas-dev#16656) * Revert "BLD: fix numpy on 3.6 build as 1.13 was released but no deps are built for it (pandas-dev#16633)" This reverts commit dfebd8a. closes pandas-dev#16634 * BUG: Fix regression for RGB(A) color arguments (pandas-dev#16701) * Add test * Pass tuples that are RGB or RGBA like in list * Update what's new * change whatsnew to reflect regression fix * Add test for RGBA as well * CI: pin jemalloc=4.4.0 (pandas-dev#16727) * MAINT: Drop Categorical.order & sort (pandas-dev#16728) Deprecated back in 0.18.1 xref pandas-devgh-12882 * Fix reading Series with read_hdf (pandas-dev#16610) * Added test to reproduce issue pandas-dev#16583 * Fix pandas-dev#16583 by adding an explicit `mode` argument to `read_hdf` kwargs which are meant for the opening of the HDFStore should be filtered out before passing the remaining kwargs to the `select` function to load the data. * Noted fix for pandas-dev#16583 in WhatsNew * DOC: typo (pandas-dev#16733) * whatsnew v0.21.0.txt typos (pandas-dev#16742) * whatsnew v0.20.3 edits (pandas-dev#16743) * BUG: do not raise UnsortedIndexError if sorting is not required closes pandas-dev#16734 Author: Pietro Battiston <me@pietrobattiston.it> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff.reback@twosigma.com> Closes pandas-dev#16736 from toobaz/index_what_you_can and squashes the following commits: f77e2b3 [Pietro Battiston] BUG: do not raise UnsortedIndexError if sorting is not required * DOC: whatsnew typos * Test for pandas-dev#16726. unittest that ensures datetime is understood (pandas-dev#16744) * Test for pandas-dev#16726. unittest that ensures datetime is understood * Corrected the test as suggested by @TomAugspurger * Fixed flake8 errors and warnings * DOC: some rst fixes (pandas-dev#16763) * DOC: Update Sphinx Deprecated Directive (pandas-dev#16512) * MAINT: Drop Index.sym_diff (pandas-dev#16760) Deprecated in 0.18.1 xref pandas-devgh-12591, pandas-devgh-12594 * MAINT: Drop pd.options.display.mpl_style (pandas-dev#16761) Deprecated in 0.18.0 xref pandas-devgh-12190 * DOC: remove section on Panel4D support in HDF io (pandas-dev#16783) * DOC: add section on data validation and library engarde (pandas-dev#16758) * TST: register slow marker (pandas-dev#16797) * TST: register slow marker * Update setup.cfg * BUG: Load data from a CategoricalIndex for dtype comparison, closes #… (pandas-dev#16738) * BUG: Load data from a CategoricalIndex for dtype comparison, closes pandas-dev#16627 * Enable is_dtype_equal on CategoricalIndex, fixed some doc typos, added ordered CategoricalIndex test * Flake8 windows suggestion * Fixed some documentation/formatting issues, clarified the purpose of the test case. * Bug in pd.merge() when merge/join with multiple categorical columns (pandas-dev#16786) closes pandas-dev#16767 * BUG: Fix read of py3 PeriodIndex DataFrame HDF made in py2 (pandas-dev#16781) (pandas-dev#16790) In Python3, reading a DataFrame with a PeriodIndex from an HDF file created in Python2 would incorrectly return a DataFrame with an Int64Index. * BUG: Fix Series doesn't work in pd.astype(). Now treat Series as dict. (pandas-dev#16725) * FIX: Allow aggregate to return dictionaries again pandas-dev#16741 (pandas-dev#16752) * BUG: fix to_latex bold_rows option (pandas-dev#16708) * Revert "CI: pin jemalloc=4.4.0 (pandas-dev#16727)" (pandas-dev#16731) This reverts commit 09d8c22. * CI: use dist/trusty rather than os/linux (pandas-dev#16806) closes pandas-dev#16730 * TST: Verify columns entirely below chop_threshold still print (pandas-dev#6839) (pandas-dev#16809) * BUG: clip dataframe column-wise pandas-dev#15390 (pandas-dev#16504) * TST: Verify that positional shifting works with duplicate columns (pandas-dev#9092) (pandas-dev#16810) * BUG: render dataframe as html do not produce duplicate element id's (pandas-dev#16780) (pandas-dev#16801) * BUG: when rendering dataframe as html do not produce duplicate element id's pandas-dev#16780 * CLN: removing spaces in code causes pylint check to fail * DOC: moved whatsnew comment to 0.20.3 release from 0.21.0 * fix BUG: ValueError when performing rolling covariance on multi indexed DataFrame (pandas-dev#16814) * fix multi index names * fix line length to pep8 * added what's new entry and reference issue number in test * Update test_multi.py * Update v0.20.3.txt * BUG: rolling.cov with multi-index columns should presever the MI (pandas-dev#16825) xref pandas-dev#16814 * use network decorator on additional tests (pandas-dev#16824) * BUG: TimedeltaIndex raising ValueError when slice indexing (pandas-dev#16637) (pandas-dev#16638) * Bug issue 16819 Index.get_indexer_not_unique inconsistent return types vs get_indexer (pandas-dev#16826) * TST: Verify that float columns stay float after pivot (pandas-dev#7142) (pandas-dev#16815) * BUG/MAINT: Change default of inplace to False in pd.eval (pandas-dev#16732) * BUG: kind parameter on categorical argsort (pandas-dev#16834) * DOC: Updated cookbook to show usage of Grouper instead of TimeGrouper… (pandas-dev#16794) * BUG: allow empty multiindex (fixes .isin regression, GH16777) (pandas-dev#16782) * BUG: fix missing sort keyword for PeriodIndex.join (pandas-dev#16586) * COMPAT: 32-bit compat for testing of indexers (pandas-dev#16849) xref pandas-dev#16826 * BUG: fix infer frequency for business daily (pandas-dev#16683) * DOC: Whatsnew updates (pandas-dev#16853) [ci skip] * TST/PKG: Move test HDF5 file to legacy (pandas-dev#16856) It wasn't being picked up in our package data otherwise * COMPAT: moar 32-bit compat for testing of indexers (pandas-dev#16861) xref pandas-dev#16826 * MAINT: Drop the get_offset_name method (pandas-dev#16863) Deprecated since 0.18.0 xref pandas-devgh-11834 * DOC: Fix missing parentheses in documentation (pandas-dev#16862) * BUG: rolling.quantile does not return an interpolated result (pandas-dev#16247) * ENH - Modify Dataframe.select_dtypes to accept scalar values (pandas-dev#16860) * COMPAT: moar 32-bit compat for testing of indexers (pandas-dev#16869) xref pandas-dev#16826 * Confirm that select was *not* clearer in 0.12 (pandas-dev#16878) * Added tests for _get_dtype (pandas-dev#16845) * BUG: Series.isin fails or categoricals (pandas-dev#16858) * COMPAT with dateutil 2.6.1, fixed ambiguous tz dst behavior (pandas-dev#16880) * fix wrongly named method (pandas-dev#16881) * TST/PKG: Removed pandas.util.testing.slow definition (pandas-dev#16852) * MAINT: Remove unused mock import (pandas-dev#16908) We import it, set it as an attribute, and then don't use it. * Let _get_dtype accept Categoricals and CategoricalIndex (pandas-dev#16887) * Fixes for pandas-dev#16896(TimedeltaIndex indexing regression for strings) (pandas-dev#16907) * Fix for pandas-dev#16909(DeltatimeIndex.get_loc is not working on np.deltatime64 data type) (pandas-dev#16912) * DOC: Recommend sphinx 1.5 for now (pandas-dev#16929) For the SciPy sprint tomorrow, until the cause of the doc-building slowdown is fully identified. * BUG: Allow value labels to be read with iterator (pandas-dev#16926) All value labels to be read before the iterator has been used Fix issue where categorical data was incorrectly reformatted when write_index was False closes pandas-dev#16923 * DOC: Update flake8 command instructions (pandas-dev#16919) * TST: Don't assert that a bug exists in numpy (pandas-dev#16940) Better to ignore the warning from the bug, rather than assert the bug is still there After this change, numpy/numpy#9412 _could_ be backported to fix the bug * CI: add .pep8speakes.yml * CLN16668: remove OrderedDefaultDict (pandas-dev#16939) * Change "pls" to "please" in error message (pandas-dev#16947) * BUG: MultiIndex sort with ascending as list (pandas-dev#16937) * DOC: Improving docstring of pop method (pandas-dev#16416) (pandas-dev#16520) * PEP8 * WARN: add stacklevel to to_dict() UserWarning (pandas-dev#16927) (pandas-dev#16936) * ERR: add stacklevel to to_dict() UserWarning (pandas-dev#16927) * TST: Add warning testing to to_dict() * Fix warning assertion on to_dict() test * Add github issue to documentation on to_dict() warning test * CI: fix pep8speaks .yml file * DOC: whatsnew 0.21.0 edits * CI: disable codecov reporting * MAINT: Move series.remove_na to core.dtypes.missing.remove_na_arraylike Closes pandas-devgh-16935 * Support non unique period indexes on join and merge operations (pandas-dev#16949) * Support non unique period indexes on join and merge operations * Add frame assertion on tests and release notes * Explicitly use dtype int64 on arange * BUG: Set secondary axis font size for `secondary_y` during plotting The parameter was not being respected for `secondary_y`. Closes pandas-devgh-12565 * DOC: more whatsnew fixes * DOC: Reset index examples closes pandas-dev#16416 Author: aernlund <awe220@nyumc.org> Closes pandas-dev#16967 from aernlund/reset_index_docs and squashes the following commits: 3c6a4b6 [aernlund] DOC: added examples to reset_index 4838155 [aernlund] DOC: added examples to reset_index 2a51e2b [aernlund] DOC: added examples to reset_index * channel from pandas to conda-forge (pandas-dev#16966) * BUG: coercing of bools in groupby transform (pandas-dev#16895) * DOC: misspelling in DatetimeIndex.indexer_between_time [CI skip] (pandas-dev#16963) * CLN: some residual code removed, xref to pandas-dev#16761 (pandas-dev#16974) * ENH: Create a 'Y' alias for date_range yearly frequency Closes pandas-devgh-9313 * Revert "ENH: Create a 'Y' alias for date_range yearly frequency" (pandas-dev#16976) This reverts commit 9c096d2, as it was prematurely made. * DOC: behavior when slicing with missing bounds (pandas-dev#16932) closes pandas-dev#16917 * TST: Add test for sub-char in read_csv (pandas-dev#16977) Closes pandas-devgh-16893. * DEPR: deprecate html.border option (pandas-dev#16970) * DOC: document convention argument for resample() (pandas-dev#16965) * DOC: document convention argument for resample() * DOC: Clarify 'it' in aggregate doc (pandas-dev#16989) Closes pandas-devgh-16988. * CLN/COMPAT: for various py2/py3 in doc/bench scripts (pandas-dev#16984) * PERF: SparseDataFrame._init_dict uses intermediary dict, not DataFrame (pandas-dev#16883) Closes pandas-devgh-16773. * MAINT: Drop line_width and height from options (pandas-dev#16993) Deprecated since 0.11 and 0.12 respectively. * COMPAT: Add back remove_na for seaborn (pandas-dev#16992) Closes pandas-devgh-16971. * COMPAT: np.full not available in all versions, xref pandas-dev#16773 (pandas-dev#17000) * DOC, TST: Clarify whitespace behavior in read_fwf documentation (pandas-dev#16950) Closes pandas-devgh-16772 * API: add infer_objects for soft conversions (pandas-dev#16915) * API: add infer_objects for soft conversions * doc fixups * fixups * doc * BUG: np.inf now causes Index to upcast from int to float (pandas-dev#16996) Closes pandas-devgh-16957. * DOC: Make highlight functions match documentation (pandas-dev#16999) Closes pandas-devgh-16998. * BUG: Large object array isin closes pandas-dev#16012 Author: Morgan Stuart <morgansstuart243@gmail.com> Closes pandas-dev#16969 from Morgan243/large_array_isin and squashes the following commits: 31cb4b3 [Morgan Stuart] Removed unneeded details from whatsnew description 4b59745 [Morgan Stuart] Linting errors; additional test clarification 186607b [Morgan Stuart] BUG pandas-dev#16012 - fix isin for large object arrays * BUG: reindex would throw when a categorical index was empty pandas-dev#16770 closes pandas-dev#16770 Author: ri938 <r_irv938@hotmail.com> Author: Jeff Reback <jeff@reback.net> Author: Tuan <tuan.d.tran@hotmail.com> Author: Forbidden Donut <forbdonut@gmail.com> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16820 from ri938/bug_issue16770 and squashes the following commits: 0e2d315 [ri938] Merge branch 'master' into bug_issue16770 9802288 [ri938] Update v0.20.3.txt 1f2865e [ri938] Update v0.20.3.txt 83fd749 [ri938] Update v0.20.3.txt eab3192 [ri938] Merge branch 'master' into bug_issue16770 7acc09f [ri938] Minor correction to previous submit 6e8f1b3 [ri938] Minor corrections to previous submit (pandas-dev#16820) 9ed80f0 [ri938] Bring documentation into line with master branch. 26e1a60 [ri938] Move documentation of change to the next major release 0.21.0 59b17cd [Jeff Reback] BUG: rolling.cov with multi-index columns should presever the MI (pandas-dev#16825) 5362447 [Tuan] fix BUG: ValueError when performing rolling covariance on multi indexed DataFrame (pandas-dev#16814) 800b40d [ri938] BUG: render dataframe as html do not produce duplicate element id's (pandas-dev#16780) (pandas-dev#16801) a725fbf [Forbidden Donut] BUG: Fix read of py3 PeriodIndex DataFrame HDF made in py2 (pandas-dev#16781) (pandas-dev#16790) 8f8e3d6 [ri938] TST: register slow marker (pandas-dev#16797) 0645868 [ri938] Add backticks in documentation 0a20024 [ri938] Minor correction to previous submit 69454ec [ri938] Minor corrections to previous submit (pandas-dev#16820) 3092bbc [ri938] BUG: reindex would throw when a categorical index was empty pandas-dev#16770 * BUG: Don't with empty Series for .isin (pandas-dev#17006) Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership. Closes pandas-devgh-16991. * ENH: Use 'Y' as an alias for end of year (pandas-dev#16978) Closes pandas-devgh-9313 Redo of pandas-devgh-16958 * DOC: infer_objects doc fixup (pandas-dev#17018) * Fixes SparseSeries initiated with dictionary raising AttributeError (pandas-dev#16960) * DOC: Improving docstring of reset_index method (pandas-dev#16416) (pandas-dev#16975) * DOC: add warning to append about inefficiency (pandas-dev#17017) * DOC : Remove redundant backtick (pandas-dev#17025) * DOC: Document business frequency aliases (pandas-dev#17028) Follow-up to pandas-devgh-16978. * DOC: Fix double back-tick in 'Reshaping by Melt' section (pandas-dev#17030) See current stable docs for the issue: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-melt The double ` is causing the entire paragraph to be fixed width until the next double `. This commit removes the extra "`" * Define DataFrame plot methods in DataFrame (pandas-dev#17020) * CLN: move safe_sort from core.algorithms to core.sorting (pandas-dev#17034) COMPAT: safe_sort will only coerce list-likes to object, not a numpy string type xref: pandas-dev#17003 (comment) * DOC: Fixed Minor Typo (pandas-dev#17043) Cocumentation to Documentation * BUG: do not cast ints to floats if inputs o crosstab are not aligned (pandas-dev#17011) closes pandas-dev#17005 * BUG in merging categorical dates closes pandas-dev#16900 Author: Dave Willmer <dave.willmer@gmail.com> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16986 from dwillmer/cat_fix and squashes the following commits: 1ea1977 [Dave Willmer] Minor tweaks + comment 21a35a0 [Dave Willmer] Merge branch 'cat_fix' of https://github.com/dwillmer/pandas into cat_fix 04d5404 [Dave Willmer] Update tests 3cc5c24 [Dave Willmer] Merge branch 'master' into cat_fix 5e8e23b [Dave Willmer] Add whatsnew item b82d117 [Dave Willmer] Lint fixes a81933d [Dave Willmer] Remove unused import 218da66 [Dave Willmer] Generic solution to categorical problem 48e7163 [Dave Willmer] Test inner join 8843c10 [Dave Willmer] Fix TypeError when merging categorical dates * BUG: __setitem__ with a tuple induces NaN with a tz-aware DatetimeIndex (pandas-dev#16889) (pandas-dev#16897) * Added test for _get_dtype_type. (pandas-dev#16899) * BUG/API: dtype inconsistencies in .where / .setitem / .putmask / .fillna (pandas-dev#16821) * CLN/BUG: fix ndarray assignment may cause unexpected cast supersedes pandas-dev#14145 closes pandas-dev#14001 * API: This fixes a number of inconsistencies and API issues w.r.t. dtype conversions. This is a reprise of pandas-dev#14145 & pandas-dev#16408. This removes some code from the core structures & pushes it to internals, where the primitives are made more consistent. This should all us to be a bit more consistent for pandas2 type things. closes pandas-dev#16402 supersedes pandas-dev#14145 closes pandas-dev#14001 CLN: remove uneeded code in internals; use split_and_operate when possible * BUG: Improved thread safety for read_html() GH16928 (pandas-dev#16930) * Fixed 'add_methods' when the 'select' argument is specified. (pandas-dev#17045) * TST: Fix error message check in np.argsort comparision (pandas-dev#17051) Closes pandas-devgh-17046. * TST: Move some Series ctor tests to SharedWithSparse (pandas-dev#17050) * BUG: Made SparseDataFrame.fillna() fill all NaNs A continuation of pandas-dev#16178 closes pandas-dev#16112 closes pandas-dev#16178 Author: Kernc <kerncece@gmail.com> Author: keitakurita <kris337jbn@yahoo.co.jp> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16892 from kernc/sparse-fillna and squashes the following commits: c1cd33e [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 2974232 [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 4bc01a1 [keitakurita] BUG: Made SparseDataFrame.fillna() fill all NaNs * BUG: Use size_t to avoid array index overflow; add missing malloc of error_msg Fix a few locations where a parser's `error_msg` buffer is written to without having been previously allocated. This manifested as a double free during exception handling code making use of the `error_msg`. Additionally, use `size_t/ssize_t` where array indices or lengths will be stored. Previously, int32_t was used and would overflow on columns with very large amounts of data (i.e. greater than INTMAX bytes). xref pandas-dev#14696 closes pandas-dev#16798 Author: Jeff Knupp <jeff.knupp@enigma.com> Author: Jeff Knupp <jeff@jeffknupp.com> Closes pandas-dev#17040 from jeffknupp/16790-core-on-large-csv and squashes the following commits: 6a1ba23 [Jeff Knupp] Clear up prose a5d5677 [Jeff Knupp] Fix linting issues 4380c53 [Jeff Knupp] Fix linting issues 7b1cd8d [Jeff Knupp] Fix linting issues e3cb9c1 [Jeff Knupp] Add unit test plus '--high-memory' option, *off by default*. 2ab4971 [Jeff Knupp] Remove debugging code 2930eaa [Jeff Knupp] Fix line length to conform to linter rules e4dfd19 [Jeff Knupp] Revert printf format strings; fix more comment alignment 3171674 [Jeff Knupp] Fix some leftover size_t references 0985cf3 [Jeff Knupp] Remove debugging code; fix type cast 669d99b [Jeff Knupp] Fix linting errors re: line length 1f24847 [Jeff Knupp] Fix comment alignment; add whatsnew entry e04d12a [Jeff Knupp] Switch to use int64_t rather than size_t due to portability concerns. d5c75e8 [Jeff Knupp] BUG: Use size_t to avoid array index overflow; add missing malloc of error_msg * TST: remove some test warnings in parser tests (pandas-dev#17057) TST: move highmemory test to proper location in c_parser_only xref pandas-dev#16798 * DOC: Add more examples for reset_index (pandas-dev#17055) * MAINT: Add dash in high memory message Follow-up to pandas-devgh-17057. * MAINT: kwards --> kwargs in parsers.pyx * CLN: Cleanup comments in before_install_travis.sh envars.sh doesn't exist anymore. In fact, it's been gone for awhile. * MAINT: Remove duplicate Series sort_index check Duplicate boolean validation check for sort_index in series/test_validate.py * BLD: Pin pyarrow=0.4.1 (pandas-dev#17065) Addresses pandas-devgh-17064. Also add some additional build information when calling `pd.show_versions` * ENH: provide "inplace" argument to set_axis() closes pandas-dev#14636 Author: Pietro Battiston <me@pietrobattiston.it> Closes pandas-dev#16994 from toobaz/set_axis_inplace and squashes the following commits: 8fb9d0f [Pietro Battiston] REF: adapt NDFrame.set_axis() calls to new signature 409f502 [Pietro Battiston] ENH: provide "inplace" argument to set_axis(), change signature * BUG: Fix parser field type compatability on 32-bit systems. (pandas-dev#17071) Closes pandas-devgh-17063 * COMPAT: rename isnull -> isna, notnull -> notna (pandas-dev#16972) closes pandas-dev#15001 * BUG: Thoroughly dedup columns in read_csv (pandas-dev#17060) * ENH: Add skipna parameter to infer_dtype (pandas-dev#17066) Currently defaults to False for backwards compatibility. Will default to True in the future. Closes pandas-devgh-17059. * MAINT: Remove unused variable in test_scalar.py The "expected" variable is unused at the end of a test in indexing/test_scalar.py * TST: Add tests/indexing/ and reshape/ to setup.py (pandas-dev#17076) Looks like we just forgot about them. Oops. * CI: partially revert pandas-dev#17065, un-pin pyarrow on some builds * DOC: whatsnew typos * TST: Check more error messages in tests (pandas-dev#17075) * BUG: Respect dtype when calling pivot_table with margins=True closes pandas-dev#17013 This fix actually exposed an occurrence of pandas-dev#17035 in an existing test (as well as in one I added). Author: Pietro Battiston <me@pietrobattiston.it> Closes pandas-dev#17062 from toobaz/pivot_margin_int and squashes the following commits: 2737600 [Pietro Battiston] Removed now obsolete workaround 956c4f9 [Pietro Battiston] BUG: respect dtype when calling pivot_table with margins=True * MAINT: Add missing space in parsers.pyx "2< heuristic" --> "2 < heuristic" * MAINT: Add missing paren around print statement Stray verbose print statement in parsers.pyx was bare without any parentheses. * DOC: fix typos in missing.rst xref pandas-dev#16972 * DOC: further clean-up null/na changes (pandas-dev#17113) * BUG: Allow pd.unique to accept tuple of strings (pandas-dev#17108) * BUG: Allow Series with same name with crosstab (pandas-dev#16028) Closes pandas-devgh-13279 * COMPAT: make sure use_inf_as_null is deprecated (pandas-dev#17126) closes pandas-dev#17115 * CI: bump version of xlsxwriter to 0.5.2 (pandas-dev#17142) * DOC: Clean up instructions in ISSUE_TEMPLATE (pandas-dev#17146) * Add missing space to the NotImplementedError's message for compound dtypes (pandas-dev#17140) * DOC: (de)type the return value of concat (pandas-dev#17079) (pandas-dev#17119) * BUG: Thoroughly dedup column names in read_csv (pandas-dev#17095) * DOC: Additions/updates to documentation (pandas-dev#17150) * ENH: add to/from_parquet with pyarrow & fastparquet (pandas-dev#15838) * DOC: doc typos, xref pandas-dev#15838 * TST: test for categorical index monotonicity (pandas-dev#17152) * correctly determine bottleneck version * tests for categorical index monotonicity * fix Index.is_monotonic to point to Index.is_monotonic_increasing directly * MAINT: Remove non-standard and inconsistently-used imports (pandas-dev#17085) * DOC: typos in whatsnew * DOC: whatsnew 0.21.0 fixes * BUG: Fix CSV parsing of singleton list header (pandas-dev#17090) Closes pandas-devgh-7757. * ENH: Support strings containing '%' in add_prefix/add_suffix (pandas-dev#17151) (pandas-dev#17162) * REF: repr - allow block to override values that get formatted (pandas-dev#17143) * MAINT: Drop unnecessary newlines in issue template * remove direct import of nan Author: Brock Mendel <jbrockmendel@gmail.com> Closes pandas-dev#17185 from jbrockmendel/dont_import_nan and squashes the following commits: ee260b8 [Brock Mendel] remove direct import of nan * use == to test String equality (pandas-dev#17171) * ENH: Add warning when setting into nonexistent attribute (pandas-dev#16951) closes pandas-dev#7175 closes pandas-dev#5904 * DOC: added string processing comparison with SAS (pandas-dev#16497) * CLN: remove unused get methods in internals (pandas-dev#17169) * Remove unused get methods that would raise AttributeError if called * Remove unnecessary import * TST: Partial Boolean DataFrame Indexing (pandas-dev#17186) Closes pandas-devgh-17170 * CLN: Reformat docstring for IPython fixture * Define Series.plot and Series.hist in class definition (pandas-dev#17199) * BUG: support pandas objects in iloc with old numpy versions (pandas-dev#17194) closes pandas-dev#17193 * Implement _make_accessor classmethod for PandasDelegate (pandas-dev#17166) * Create ABCDateOffset (pandas-dev#17165) * BUG: resample and apply modify the index type for empty Series (pandas-dev#17149) * DOC: Updated NDFrame.astype docs (pandas-dev#17203) * MAINT: Minor touch-ups to GitHub PULL_REQUEST_TEMPLATE (pandas-dev#17207) Remove leading space from task-list so that tasks aren't nested. * CLN: replace %s syntax with .format in core.computation (pandas-dev#17209) * Bugfix for multilevel columns with empty strings in Python 2 (pandas-dev#17099) * CLN/ASV clean-up frame stat ops benchmarks (pandas-dev#17205) * BUG: Rolling apply on DataFrame with Datetime index returns NaN (pandas-dev#17156) * CLN: Remove import exception handling (pandas-dev#17218) Imports should succeed on all versions of Python that pandas supports. * MAINT: Remove extra the's in deprecation messages (pandas-dev#17222) * DOC: Patch docs in _decorators.py * CLN: replace %s syntax with .format in pandas.util (pandas-dev#17224) * Add 'See also' sections (pandas-dev#17223) * move pivot_table doc-string to DataFrame (pandas-dev#17174) * Remove import of pandas as pd in core.window (pandas-dev#17233) * TST: Move more frame tests to SharedWithSparse (pandas-dev#17227) * REF: _get_objs_combined_axis (pandas-dev#17217) * ENH/PERF: Remove frequency inference from .dt accessor (pandas-dev#17210) * ENH/PERF: Remove frequency inference from .dt accessor * BENCH: Add DatetimeAccessor benchmark * DOC: Whatsnew * Fix apparent typo in tests (pandas-dev#17247) * COMPAT: avoid calling getsizeof() on PyPy closes pandas-dev#17228 Author: mattip <matti.picus@gmail.com> Closes pandas-dev#17229 from mattip/getsizeof-unavailable and squashes the following commits: d2623e4 [mattip] COMPAT: avoid calling getsizeof() on PyPy * CLN: replace %s syntax with .format in pandas.core.reshape (pandas-dev#17252) Replaced %s syntax with .format in pandas.core.reshape. Additionally, made some of the existing positional .format code more explicit. * ENH: Infer compression from non-string paths (pandas-dev#17206) * Fix bugs in IntervalIndex.is_non_overlapping_monotonic (pandas-dev#17238) * BUG: Fix behavior of argmax and argmin with inf (pandas-dev#16449) (pandas-dev#16449) Closes pandas-dev#13595 * CLN: Remove have_pytz (pandas-dev#17266) Closes pandas-devgh-17251 * CLN: replace %s syntax with .format in core.dtypes and core.sparse (pandas-dev#17270) * Replace imports of * with explicit imports (pandas-dev#17269) xref pandas-dev#17234 * TST: pytest deprecation warnings GH17197 (pandas-dev#17253) Test parameters with marks are updated according to the updated API of Pytest. https://docs.pytest.org/en/latest/changelog.html#pytest-3-2-0-2017-07-30 https://docs.pytest.org/en/latest/parametrize.html * Handle more date/datetime/time formats (pandas-dev#15871) * DOC: add example on json_normalize (pandas-dev#16438) * BUG: Have object dtype for empty Categorical.categories (pandas-dev#17249) * BUG: Have object dtype for empty Categorical ctor Previously we had a `Float64Index`, which is inconsistent with, e.g., the regular Index constructor. * TST: Update tests in multi for new return Previously these relied worked around the return type by wrapping list-likes in `np.array` and relying on that to cast to float. These workarounds are no longer nescessary. * TST: Update union_categorical tests This relied on `NaN` being a float and empty being a float. Not a necessary test anymore. * TST: set object dtype * CLN: replace %s syntax with .format in pandas.tseries (pandas-dev#17290) * TST: parameterize consistency tests for rolling/expanding windows (pandas-dev#17292) * FIX: define `DataFrame.items` for all versions of python (pandas-dev#17214) * PERF: Update ASV publish config (pandas-dev#17293) Stricter cutoffs for considering regressions [ci skip] * DOC: Expand docstrings for head / tail methods (pandas-dev#16941) * MAINT: Use set literal for unsupported + depr args Initializes unsupported and deprecated argument sets with set literals instead of the set constructor in pandas/io/parsers.py, as the former is slightly faster than the latter. * DOC: Add proper docstring to maybe_convert_indices Patches several spelling errors and expands current doc to a proper doc-string. * DOC: Improving docstring of take method (pandas-dev#16948) * BUG: Fixed regex in asv.conf.json (pandas-dev#17300) In pandas-dev#17293 I messed up the syntax. I used a glob instead of a regex. According to the docs at http://asv.readthedocs.io/en/latest/asv.conf.json.html#regressions-thresholds we want to use a regex. I've actually manually tested this change and verified that it works. [ci skip] * Remove unnecessary usage of _TSObject (pandas-dev#17297) * BUG: clip should handle null values closes pandas-dev#17276 Author: Michael Gasvoda <mgasvoda@mercatus.gmu.edu> Author: mgasvoda <mgasvoda01@gmail.com> Closes pandas-dev#17288 from mgasvoda/master and squashes the following commits: a1dbdf2 [mgasvoda] Merge branch 'master' into master 9333952 [Michael Gasvoda] Checking output of tests 4e0464e [Michael Gasvoda] fixing whatsnew text c442040 [Michael Gasvoda] formatting fixes 7e23678 [Michael Gasvoda] formatting updates 781ea72 [Michael Gasvoda] whatsnew entry d9627fe [Michael Gasvoda] adding clip tests 9aa0159 [Michael Gasvoda] Treating na values as none for clips * BUG: fillna returns frame when inplace=True if value is a dict (pandas-dev#16156) (pandas-dev#17279) * CLN: Index.append() refactoring (pandas-dev#16236) * DEPS: set min versions (pandas-dev#17002) closes pandas-dev#15206, numpy >= 1.9 closes pandas-dev#15543, matplotlib >= 1.4.3 scipy >= 0.14.0 * CLN: replace %s syntax with .format in core.tools, algorithms.py, base.py (pandas-dev#17305) * BUG: Fix strange behaviour of Series.iloc on MultiIndex Series (pandas-dev#17148) (pandas-dev#17291) * DOC: Add module doc-string to tseries/api.py * MAINT: Clean up docs in pandas/errors/__init__.py * CLN: replace %s syntax with .format in missing.py, nanops.py, ops.py (pandas-dev#17322) Replaced %s syntax with .format in missing.py, nanops.py, ops.py. Additionally, made some of the existing positional .format code more explicit. * Make pd.Period immutable (pandas-dev#17239) * Bug: groupby multiindex levels equals rows (pandas-dev#16859) closes pandas-dev#16843 * BUG: Cannot use tz-aware origin in to_datetime (pandas-dev#16842) closes pandas-dev#16842 Author: step4me <prosikeffect@gmail.com> Closes pandas-dev#17244 from step4me/step4me-feature and squashes the following commits: 09d051d [step4me] BUG: Cannot use tz-aware origin in to_datetime (pandas-dev#16842) * Replace usage of total_seconds compat func with timedelta method (pandas-dev#17289) * CLN: replace %s syntax with .format in core/indexing.py (pandas-dev#17357) Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in core/indexing.py. * DOC: Point to dev-docs in issue template (pandas-dev#17353) [ci skip] * CLN: remove total_seconds compat from json (pandas-dev#17341) * CLN: Move test_intersect_str_dates (pandas-dev#17366) Moves test_intersect_str_dates from tests/indexes/test_range.py to tests/indexes/test_base.py. * BUG: Respect dups in reindexing CategoricalIndex (pandas-dev#17355) When the indexer is identical to the elements. We should still return duplicates when the indexer contains duplicates. Closes pandas-devgh-17323. * Unify Index._dir_* with Series implementation (pandas-dev#17117) * BUG: make order of index from pd.concat deterministic (pandas-dev#17364) closes pandas-dev#17344 * Fix typo that causes several NaT methods to have incorrect docstrings (pandas-dev#17327) * CLN: replace %s syntax with .format in io/formats/format.py (pandas-dev#17358) Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in io/formats/format.py. * PKG: Added pyproject.toml for PEP 518 (pandas-dev#16745) Declaring build-time requirements: https://www.python.org/dev/peps/pep-0518/ * DOC: Update Overview page in documentation (pandas-dev#17368) * Update Overview page in documentation * DOC Revise Overview page * DOC Make further revisions in Overview webpage * Update overview.rst Remove references to Panel * API: Have MultiIndex consturctors always return a MI (pandas-dev#17236) * API: Have MultiIndex constructors return MI This removes the special case for MultiIndex constructors returning an Index if all the levels are length-1. Now this will return a MultiIndex with a single level. This is a backwards incompatabile change, with no clear method for deprecation, so we're making a clean break. Closes pandas-dev#17178 * fixup! API: Have MultiIndex constructors return MI * Update for comments

Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership. Closes pandas-devgh-16991.

BUG: Don't with empty Series for .isin

c2cec77

Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership. Closes pandas-devgh-16991.

gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 18, 2017

gfyoung added this to the 0.21.0 milestone Jul 18, 2017

gfyoung changed the title ~~BUG: Don't with empty Series for .isin~~ BUG: Don't error with empty Series for .isin Jul 18, 2017

jreback requested changes Jul 18, 2017

View reviewed changes

gfyoung merged commit e5de21a into pandas-dev:master Jul 19, 2017

gfyoung deleted the isin-series-fail branch July 19, 2017 02:55

JohnnyC08 mentioned this pull request Dec 22, 2018

Ordinal encoder support new handle unknown handle missing scikit-learn-contrib/category_encoders#153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Don't error with empty Series for .isin #17006

BUG: Don't error with empty Series for .isin #17006

gfyoung commented Jul 18, 2017

codecov bot commented Jul 18, 2017 •

edited

Loading

jreback Jul 18, 2017

gfyoung Jul 18, 2017

jreback Jul 18, 2017

gfyoung Jul 19, 2017 •

edited

Loading

jreback commented Jul 19, 2017

gfyoung commented Jul 19, 2017

BUG: Don't error with empty Series for .isin #17006

BUG: Don't error with empty Series for .isin #17006

Conversation

gfyoung commented Jul 18, 2017

codecov bot commented Jul 18, 2017 • edited Loading

Codecov Report

jreback Jul 18, 2017

Choose a reason for hiding this comment

gfyoung Jul 18, 2017

Choose a reason for hiding this comment

jreback Jul 18, 2017

Choose a reason for hiding this comment

gfyoung Jul 19, 2017 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 19, 2017

gfyoung commented Jul 19, 2017

codecov bot commented Jul 18, 2017 •

edited

Loading

gfyoung Jul 19, 2017 •

edited

Loading